Title:Chocolate Sales Analysis – Uncovering Insights from 2022 Global Sales Data¶

Project Goal:¶

The goal of this project is to analyze chocolate sales data from the year 2022 to extract meaningful insights and trends. This includes validating and cleaning the dataset, performing univariate, bivariate, and multivariate statistical analysis, and visualizing key patterns. By understanding factors like top-performing salespersons, most popular products, seasonal sales trends, and country-wise performance, the project aims to support data-driven decision-making in marketing, supply chain, and product strategy for chocolate sales.

Importing Necessary Packages¶

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

import warnings
warnings.filterwarnings("ignore")
In [2]:
raw=pd.read_csv("Chocolate Sales @ Kg\Data.csv")
raw
Out[2]:
Sales Person Country Product Date Amount Boxes Shipped
0 Jehu Rudeforth UK Mint Chip Choco 04-Jan-22 $5,320 180
1 Van Tuxwell India 85% Dark Bars 01-Aug-22 $7,896 94
2 Gigi Bohling India Peanut Butter Cubes 07-Jul-22 $4,501 91
3 Jan Morforth Australia Peanut Butter Cubes 27-Apr-22 $12,726 342
4 Jehu Rudeforth UK Peanut Butter Cubes 24-Feb-22 $13,685 184
... ... ... ... ... ... ...
1089 Karlen McCaffrey Australia Spicy Special Slims 17-May-22 $4,410 323
1090 Jehu Rudeforth USA White Choc 07-Jun-22 $6,559 119
1091 Ches Bonnell Canada Organic Choco Syrup 26-Jul-22 $574 217
1092 Dotty Strutley India Eclairs 28-Jul-22 $2,086 384
1093 Karlen McCaffrey India 70% Dark Bites 23-May-22 $5,075 344

1094 rows × 6 columns

column info

column discription
Sales Person Name of the salesperson responsible for the transaction.
Country Sales region or store location where the transaction took place.
Product Name of the chocolate product sold.
Date The transaction date of the chocolate sale.
Amount Total revenue generated from the sale.
Boxes Shipped Number of chocolate boxes shipped in the order.
In [5]:
df=raw.copy()
                                                  Basic Checks
In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1094 entries, 0 to 1093
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Sales Person   1094 non-null   object
 1   Country        1094 non-null   object
 2   Product        1094 non-null   object
 3   Date           1094 non-null   object
 4   Amount         1094 non-null   object
 5   Boxes Shipped  1094 non-null   int64 
dtypes: int64(1), object(5)
memory usage: 51.4+ KB
In [7]:
# df.Amount=df.Amount.str.strip('$').str.replace(',','').str.strip()
# df['Amount']=df.Amount.astype('int')

# or
df['Amount'] = df['Amount'].replace('[\$,]', '', regex=True).astype(int)
In [8]:
df.rename({'Amount':'Amount in thousand ($)'},axis=1,inplace=True)
In [9]:
df['Date'] = pd.to_datetime(df['Date'], format='%d-%b-%y', errors='coerce')
In [10]:
df.isna().sum()
Out[10]:
Sales Person              0
Country                   0
Product                   0
Date                      0
Amount in thousand ($)    0
Boxes Shipped             0
dtype: int64
In [11]:
df
Out[11]:
Sales Person Country Product Date Amount in thousand ($) Boxes Shipped
0 Jehu Rudeforth UK Mint Chip Choco 2022-01-04 5320 180
1 Van Tuxwell India 85% Dark Bars 2022-08-01 7896 94
2 Gigi Bohling India Peanut Butter Cubes 2022-07-07 4501 91
3 Jan Morforth Australia Peanut Butter Cubes 2022-04-27 12726 342
4 Jehu Rudeforth UK Peanut Butter Cubes 2022-02-24 13685 184
... ... ... ... ... ... ...
1089 Karlen McCaffrey Australia Spicy Special Slims 2022-05-17 4410 323
1090 Jehu Rudeforth USA White Choc 2022-06-07 6559 119
1091 Ches Bonnell Canada Organic Choco Syrup 2022-07-26 574 217
1092 Dotty Strutley India Eclairs 2022-07-28 2086 384
1093 Karlen McCaffrey India 70% Dark Bites 2022-05-23 5075 344

1094 rows × 6 columns

1. Data Validation¶

In [12]:
def colValidate(df,col):
    print(f"column : {col}")
    print()
    print(f"Number of unique values in column: {df[col].nunique()}")
    print()
    print("Unique Values:")
    if df[col].nunique() >=100:
        for i in range(0,df[col].nunique(),100):
            print(df[col].unique()[i:i+100])
            print()
    else:
        print(df[col].unique())
        print()
    print(f"Data type of column:{df[col].dtype}")
    print()

Sales Person¶

In [13]:
colValidate(df,'Sales Person')
column : Sales Person

Number of unique values in column: 25

Unique Values:
['Jehu Rudeforth' 'Van Tuxwell' 'Gigi Bohling' 'Jan Morforth' 'Oby Sorrel'
 'Gunar Cockshoot' 'Brien Boise' 'Rafaelita Blaksland' 'Barr Faughny'
 'Mallorie Waber' 'Karlen McCaffrey' "Marney O'Breen" 'Beverie Moffet'
 'Roddy Speechley' 'Curtice Advani' 'Husein Augar' 'Kaine Padly'
 'Dennison Crosswaite' "Wilone O'Kielt" 'Andria Kimpton' 'Kelci Walkden'
 'Camilla Castle' 'Madelene Upcott' 'Dotty Strutley' 'Ches Bonnell']

Data type of column:object

Country¶

In [14]:
colValidate(df,'Country')
column : Country

Number of unique values in column: 6

Unique Values:
['UK' 'India' 'Australia' 'New Zealand' 'USA' 'Canada']

Data type of column:object

Product¶

In [15]:
colValidate(df,'Product')
column : Product

Number of unique values in column: 22

Unique Values:
['Mint Chip Choco' '85% Dark Bars' 'Peanut Butter Cubes'
 'Smooth Sliky Salty' '99% Dark & Pure' 'After Nines' '50% Dark Bites'
 'Orange Choco' 'Eclairs' 'Drinking Coco' 'Organic Choco Syrup'
 'Milk Bars' 'Spicy Special Slims' 'Fruit & Nut Bars' 'White Choc'
 'Manuka Honey Choco' 'Almond Choco' 'Raspberry Choco'
 'Choco Coated Almonds' "Baker's Choco Chips" 'Caramel Stuffed Bars'
 '70% Dark Bites']

Data type of column:object

Amount¶

In [16]:
colValidate(df,'Amount in thousand ($)')
column : Amount in thousand ($)

Number of unique values in column: 827

Unique Values:
[ 5320  7896  4501 12726 13685  5376  3080  3990  2835  4704  3703  1442
   168  8379  6790  4067  3017  8799  1085  6888  1267  4753  3003  7672
  1652  4025  9492  5061  1722 12446  4284  6839  2163  9583  2653   147
  3654  2443   938 14749  4781  6307  7602  9737  6979  4382  5243  4865
  8575    91 14798  2205   441  3556 16793 15421  4438  1603   273  3073
  6090 10255  2030 19453  9275  6181  9037 12313  5642  2800   959  2002
   609  1274  7595  4725  9681 14504   280    63  8001  4032  5859 11095
  7182  6881  7154  6188  4221   630  1743  2919    49  1827 13006  1064
 11571  5740  1456  5334]

[ 4151  8106   126  4697  7798  9408  1939 10906  5929  5579 10927   623
  6013  1505   476 11550 17318  1848  7273   854  6832  4606  4466 16114
  1561  8897  2464  4830  2765  7126  4263  3605 19929  5103  2541  5663
   392 10976  9282  8267  4116  2093  1015 12516  2758  3297  6048 10101
  2282  4361  3052  8029  5460  8911  7203 13083  2779  9058  3549  9436
 10283  5446  1043 12586  1687  5299  3213  5194 13706  6489  9324 10829
  8113  3269  7287  2583  9982  4795  9310   497   581  3472  4333  3325
 11718  2100  4018  6468  7238  6454  3115  6475  1162 14238  1428  8064
  9660  7357  6055  5124]

[ 3479   784 18991  1372   735  6538  3199  4571  5481  3136   252  3395
 14938  4053  5565  7161  8883  1351  3171  7910  3108  5075   378  7350
  3388 11837  5425  3752  7728  2296  4403  3192   448  4270    70  6762
  3745  2639  4389  2604 16569 14658   161  2807 13034  8484  2240 13447
  9422  8687  2415  6272  3122  8043 11662  8925  7294  1526  2688 14889
  2912  2632  6328  2527  1414  7714  4347  7490  2058  3640   455 10990
  5558  1876  5523 11200  5593  7882 10241  1379  3010  1540  6916  1232
   602 10185  2170  8673  4760  3374   420  2863  5936   889 16016 10479
  1575  8197   840  5691]

[ 4193  9016  9870  2576  3178  4676  5502  7462  5803 13888  6867  2317
  1218  4109  1953  6797  9226  5733  4312  1638 10815   987  4669  2905
  6986  1288  2345  3619 16982  8092  6993   637  6034   980  2821  3563
  2996  2436  5509 12992  3724  7133  8617  9198 11823  5775 13125 14287
 16233  5313  3577  3528   679  2450 10577  2597  2219 11319  5978  5327
  6020  5614  1736  6384 14539  3493   994  1554   966  4935 10024  2506
  6524  8148  3948  5271 12327  6167 18340  7014  7119 15491  5747  4550
  2191  7623  9023  3402 10507  7721  5033  1960   238  7756  2660 11564
  1365  4186  7406   112]

[ 8204  2611 15652  4074 12250  2366  6965  5292   588  4046  7042  6713
  6440 10885  2387  6678  4515  5852  1750  5782  2870  3094 12761  3696
  5222  8939  2156  2380  3339 14980  1512  6657  3836  8771   651  6706
  1421  8526  1435  7434 15316  2751  2786  2303 12271 11298 15855 12404
 10808  4858  7742 10983  6769   777  3843  6930  8393  3822  6342  6510
 10171  5908 10164  9716 22050  9989  4739  3185  8225 14301  1316  3486
 13930  8470    77  3381  4102  2975  4137  9541  5152 11116 13076 10213
  2485  8715     7  8904  4396 12068  9772 10458  6426  7504  5439 15547
 11956  2723 19327  8848]

[14336 16401  5173  2534   910  2331  2086  5012 13482  5621 10486 17626
  4494   105  3220  4802  5677   945  8757  4816  7532  4340  1799  1127
 10038  2926  6279   308  3500   658 12565  3955  8512   504  4256 13573
  6566 13503  8680   385  2961  1981  7959 10794  1897  2744  5474  6069
  6944 12173  5705  2492  3024  3437  1869  1225  1155  6811  6433  8169
  2275  3857  1463  7924  2898  9506  7175  1729  1589  5187  6223  9457
  2107  1862  6972 10220  3969  1547 10633 15057  9338 14525  5810  1582
   791  9100  9884  3780  4557  5796    84  4746  6237  7483  1309  2989
  9625 10031  6587  3311]

[15330 14028  8624  1197   483  9534  1694  2933  2044 12656  8722 15750
 19481  8099 13727  8659  8463  4914  3087  9205  1358  8498   700   644
   826   203 13356  5894  1617 10822 10724  2016  7588  1904  4844  4599
  9744  1400   364  7231  8309  3164   490 18032  5131  1141  7980  5229
  3423  4326  3997  5180   952 10969  5964  1302  4179  1946  4291  7196
 10437  3647  8995  7252 16380  4522   329  3458  4641  4627   868   210
  7007  5845  8428  4977  9527 10766  4879  2408  8981  5250  7560  3038
  1099  7413  5397 10325  2149 12362 12635  7315  1393  8022   371 16702
 13258 10192  7140  9835]

[  749  7770  3507  5600   721  6615  7420  9114  5404 14924  7091  6496
  3738  1190  9268  1029 12026 13405 14763 12894  8302  8183  1645  3143
  5873 12425  1715  8190  4991  5096  7063  4200 11137   301 17465 10143
  3626  3346  6321  9345 15008  7609 10332   819  5754    28  4956  2352
  3906  1813 11788  5768  2401 10122  2268   574  1792  9107 12187  5670
  2681  1533  9765  3318  4214  4690 13062 12488 14147 10199 11389  3584
 12481  8603  6559 18697 10150  4592  4158  5684   693   546  1036 13846
  2499 10689  2226 10262  3864  4900  8155  2478  2471  1666  8400  6139
  7077  9954  1260   469]

[  973  2520 10500  7952  4872  6594  1470  2674  7364  6818  6776 15099
  3612  3710  6111  2702  5887 10647  8589 11781   672 12558  8337  8134
  8491  3066  4410]

Data type of column:int32

  • All column data is validated and in proper

Duplicated rows Check & Handling¶

In [17]:
df[df.duplicated()]
Out[17]:
Sales Person Country Product Date Amount in thousand ($) Boxes Shipped
  • No duplicated rows found in data

2. EDA (Data Insights from stats methods)¶

In [18]:
data = df.copy()
In [19]:
data.head(2)
Out[19]:
Sales Person Country Product Date Amount in thousand ($) Boxes Shipped
0 Jehu Rudeforth UK Mint Chip Choco 2022-01-04 5320 180
1 Van Tuxwell India 85% Dark Bars 2022-08-01 7896 94
column type
Sales Person Categorical Nominal
Country Product Categorical Nominal
product Categorical Nominal
Date Date and time
Amount in thousand ($) Numerical Continuous
Boxes Shipped Numerical Discrete
  • Refering to above variable types we can use stats measures & visualizations accordingly to the above techniques,

2.2 Analysis

2.2.1 Uni-Variate Analysis - Study of Individual Column data

  • Descriptive Stats Univariate measures,
    • Numeric
      • Discrete
        • round(Mean), round(Median), Mode, Five Number Summary, Std, Skewness, Kurtosis
      • Continuous
          • Mean, Median, Five Number Summary, Std, Skewness, Kurtosis
    • Categorical & Boolean
      • nunique
      • unique
      • FDT
      • Mode
    • Date Time
      • Start Date, End Date, Diff between Start & End Date
  • Visualizations for Uni-Variate
    • Categorical: Comparission: Pie/Bar
    • Numerical: Distribution: Hist/Box/Density

Note on Visualizations¶

All visualizations in this project are created using Plotly, a powerful Python library for interactive charts and dashboards. Since Plotly generates dynamic, web-based visualizations, the charts and graphs can only be viewed when the code is executed (i.e., not as static images in a file).

In [20]:
data.head(2)
Out[20]:
Sales Person Country Product Date Amount in thousand ($) Boxes Shipped
0 Jehu Rudeforth UK Mint Chip Choco 2022-01-04 5320 180
1 Van Tuxwell India 85% Dark Bars 2022-08-01 7896 94

Taking Uni-Variate Descriptive Stats User-Defined Functions

In [21]:
pd.set_option('display.float_format', lambda x: '%.2f' % x) 

from simple_colors import *

############################ Numeric Continuous ############################
def ncstudy(df, col):
    print(green("#######################################################",['bold']))
    print(green("Taken Numeric Continuous Column:",['bold']), black(col,['bold']))
    print(green("#######################################################",['bold']))
    print()
    print(cyan("Descriptive Stats:",['bold']))
    print()
    print(blue("******** Measures of Central Tendancy ************", ['bold']))
    print(magenta("Mean:",['bold']), round(df[col].mean(),2))
    print(magenta("Median:",['bold']), df[col].median())
    print(magenta("Mode:",['bold']), df[col].mode()[0]) # Taking first value
    print()
    print(blue("******** Measures of Dispersion ************",['bold']))
    print(magenta("Range:",['bold']), df[col].max()-df[col].min())
    print(magenta("Variance:",['bold']), round(df[col].var(),2))
    print(magenta("Standard Deviation:",['bold']), round(df[col].std(),2))
    print(magenta("Five Number Summary:",['bold']))
    print(round(data[col].describe(),2)[['min','25%','50%','75%','max']])
    print()
    print(blue("******** Measures of Symmetry ************",['bold']))
    print(magenta("Skewness:",['bold']), round(df[col].skew(),2))
    print(magenta("Kurtosis:",['bold']), round(df[col].kurt(),2))
    print()
    print(cyan("Visualization:",['bold']))
    print()
    px.box(df[col], orientation='h', width=650, height=300).show()
    print()

############################## Numeric Discrete #################################
def ndstudy(df, col):
    print(green("#######################################################",['bold']))
    print(green("Taken Numeric Discrete Column:",['bold']), black(col,['bold']))
    print(green("#######################################################",['bold']))
    print()
    print(cyan("Uni-Variate Descriptive Stats:",['bold']))
    print()
    print("******** Measures of Central Tendancy ************")
    print(magenta("Mean:",['bold']), round(df[col].mean()))
    print(magenta("Median:",['bold']), round(df[col].median()))
    print(magenta("Mode:",['bold']), df[col].mode()[0]) # Taking first value
    print()
    print("******** Measures of Dispersion ************")
    print(magenta("Range:",['bold']), df[col].max()-df[col].min())
    print(magenta("Variance:",['bold']), round(df[col].var()))
    print(magenta("Standard Deviation:",['bold']), round(df[col].std()))
    print(magenta("Five Number Summary:",['bold']))
    print(round(data[col].describe())[['min','25%','50%','75%','max']])
    print()
    print("******** Measures of Symmetry ************")
    print(magenta("Skewness:",['bold']), round(df[col].skew(),2))
    print(magenta("Kurtosis:",['bold']), round(df[col].kurt(),2))
    print()
    print(cyan("Visualization:",['bold']))
    print()
    px.box(df[col], orientation='h', width=650, height=300).show()
    print()

############################# Categorical #######################################
def catstudy(df, col):
    print(green("#######################################################",['bold']))
    print(green("Taken Categorical Column:",['bold']), black(col,['bold']))
    print(green("#######################################################",['bold']))
    print()
    print(cyan("Uni-Variate Descriptive Stats:",['bold']))
    print()
    print(magenta("Number of Categories/Classes in column:",['bold']), df[col].nunique())
    print(magenta("Category Names:",['bold']))
    print(df[col].unique())
    print()
    print(magenta("Value Counts (FD) of each Category:",['bold']))
    print(df[col].value_counts())
    print()
    print(magenta("Value Counts of Each Class (FD) as Percentage:",['bold']))
    print(round((df[col].value_counts()/len(df))*100,2))
    print()
    print(magenta("Mode:",['bold']), df[col].mode()[0])
    print()
    print(cyan("Visualization:",['bold']))
    print()        
    print(black("Top Catgeories:", ['bold']))
    # Considering only top 10 categories for pie chart
    index = df[col].value_counts().sort_values(ascending=False)[0:10].index
    vals = df[col].value_counts().sort_values(ascending=False)[0:10].values
    fig = px.pie(names=index, values=vals, width=700, height=400)
    fig.update_traces(textposition='inside', textinfo='percent+label')
    fig.update_layout(showlegend=False)
    fig.show()
    print()
    
######################################## DateTime ######################################
def datestudy(df, col):
    print(green("#######################################################",['bold']))
    print(green("Taken Date Column:",['bold']), black(col,['bold']))
    print(green("#######################################################",['bold']))
    print()
    print(cyan("Uni-Variate Descriptive Stats:",['bold']))
    print()
    print(magenta("Start Date:",['bold']), df[col].min())
    print(magenta("End Date:",['bold']), df[col].max())
    print(magenta("Total Time Period (in Years):",['bold']), (df[col].max()-df[col].min()))
    print()
    print(cyan("Visualization:",['bold']))
    print()
    index = df[col].value_counts().index
    vals = df[col].value_counts().values
    px.scatter(x = index, y = vals, width=500, height=400).show()
    print()
In [22]:
for col in data.columns:
    if 'Year' in col:
        datestudy(data, col)
    elif data[col].dtype == object:
        catstudy(data, col)
    elif data[col].dtype == 'float64':
        ncstudy(data, col)
    elif data[col].dtype == 'int64':
        ndstudy(data, col)
#######################################################
Taken Categorical Column: Sales Person
#######################################################

Uni-Variate Descriptive Stats:

Number of Categories/Classes in column: 25
Category Names:
['Jehu Rudeforth' 'Van Tuxwell' 'Gigi Bohling' 'Jan Morforth' 'Oby Sorrel'
 'Gunar Cockshoot' 'Brien Boise' 'Rafaelita Blaksland' 'Barr Faughny'
 'Mallorie Waber' 'Karlen McCaffrey' "Marney O'Breen" 'Beverie Moffet'
 'Roddy Speechley' 'Curtice Advani' 'Husein Augar' 'Kaine Padly'
 'Dennison Crosswaite' "Wilone O'Kielt" 'Andria Kimpton' 'Kelci Walkden'
 'Camilla Castle' 'Madelene Upcott' 'Dotty Strutley' 'Ches Bonnell']

Value Counts (FD) of each Category:
Kelci Walkden          54
Brien Boise            53
Van Tuxwell            51
Beverie Moffet         50
Dennison Crosswaite    49
Oby Sorrel             49
Ches Bonnell           48
Karlen McCaffrey       47
Gigi Bohling           47
Curtice Advani         46
Kaine Padly            45
Madelene Upcott        45
Marney O'Breen         45
Barr Faughny           43
Jehu Rudeforth         43
Roddy Speechley        43
Gunar Cockshoot        43
Mallorie Waber         41
Jan Morforth           39
Andria Kimpton         39
Husein Augar           38
Dotty Strutley         36
Wilone O'Kielt         34
Rafaelita Blaksland    34
Camilla Castle         32
Name: Sales Person, dtype: int64

Value Counts of Each Class (FD) as Percentage:
Kelci Walkden         4.94
Brien Boise           4.84
Van Tuxwell           4.66
Beverie Moffet        4.57
Dennison Crosswaite   4.48
Oby Sorrel            4.48
Ches Bonnell          4.39
Karlen McCaffrey      4.30
Gigi Bohling          4.30
Curtice Advani        4.20
Kaine Padly           4.11
Madelene Upcott       4.11
Marney O'Breen        4.11
Barr Faughny          3.93
Jehu Rudeforth        3.93
Roddy Speechley       3.93
Gunar Cockshoot       3.93
Mallorie Waber        3.75
Jan Morforth          3.56
Andria Kimpton        3.56
Husein Augar          3.47
Dotty Strutley        3.29
Wilone O'Kielt        3.11
Rafaelita Blaksland   3.11
Camilla Castle        2.93
Name: Sales Person, dtype: float64

Mode: Kelci Walkden

Visualization:

Top Catgeories:
#######################################################
Taken Categorical Column: Country
#######################################################

Uni-Variate Descriptive Stats:

Number of Categories/Classes in column: 6
Category Names:
['UK' 'India' 'Australia' 'New Zealand' 'USA' 'Canada']

Value Counts (FD) of each Category:
Australia      205
India          184
USA            179
UK             178
Canada         175
New Zealand    173
Name: Country, dtype: int64

Value Counts of Each Class (FD) as Percentage:
Australia     18.74
India         16.82
USA           16.36
UK            16.27
Canada        16.00
New Zealand   15.81
Name: Country, dtype: float64

Mode: Australia

Visualization:

Top Catgeories:
#######################################################
Taken Categorical Column: Product
#######################################################

Uni-Variate Descriptive Stats:

Number of Categories/Classes in column: 22
Category Names:
['Mint Chip Choco' '85% Dark Bars' 'Peanut Butter Cubes'
 'Smooth Sliky Salty' '99% Dark & Pure' 'After Nines' '50% Dark Bites'
 'Orange Choco' 'Eclairs' 'Drinking Coco' 'Organic Choco Syrup'
 'Milk Bars' 'Spicy Special Slims' 'Fruit & Nut Bars' 'White Choc'
 'Manuka Honey Choco' 'Almond Choco' 'Raspberry Choco'
 'Choco Coated Almonds' "Baker's Choco Chips" 'Caramel Stuffed Bars'
 '70% Dark Bites']

Value Counts (FD) of each Category:
50% Dark Bites          60
Eclairs                 60
Smooth Sliky Salty      59
White Choc              58
Drinking Coco           56
Spicy Special Slims     54
Organic Choco Syrup     52
After Nines             50
85% Dark Bars           50
Fruit & Nut Bars        50
Milk Bars               49
Peanut Butter Cubes     49
99% Dark & Pure         49
Almond Choco            48
Raspberry Choco         48
Orange Choco            47
Mint Chip Choco         45
Manuka Honey Choco      45
Caramel Stuffed Bars    43
70% Dark Bites          42
Baker's Choco Chips     41
Choco Coated Almonds    39
Name: Product, dtype: int64

Value Counts of Each Class (FD) as Percentage:
50% Dark Bites         5.48
Eclairs                5.48
Smooth Sliky Salty     5.39
White Choc             5.30
Drinking Coco          5.12
Spicy Special Slims    4.94
Organic Choco Syrup    4.75
After Nines            4.57
85% Dark Bars          4.57
Fruit & Nut Bars       4.57
Milk Bars              4.48
Peanut Butter Cubes    4.48
99% Dark & Pure        4.48
Almond Choco           4.39
Raspberry Choco        4.39
Orange Choco           4.30
Mint Chip Choco        4.11
Manuka Honey Choco     4.11
Caramel Stuffed Bars   3.93
70% Dark Bites         3.84
Baker's Choco Chips    3.75
Choco Coated Almonds   3.56
Name: Product, dtype: float64

Mode: 50% Dark Bites

Visualization:

Top Catgeories:
#######################################################
Taken Numeric Discrete Column: Boxes Shipped
#######################################################

Uni-Variate Descriptive Stats:

******** Measures of Central Tendancy ************
Mean: 162
Median: 135
Mode: 24

******** Measures of Dispersion ************
Range: 708
Variance: 14773
Standard Deviation: 122
Five Number Summary:
min     1.00
25%    70.00
50%   135.00
75%   229.00
max   709.00
Name: Boxes Shipped, dtype: float64

******** Measures of Symmetry ************
Skewness: 1.11
Kurtosis: 1.16

Visualization:


Insights

  • Kelci Walkden is the top-performing salesperson, selling 54 unique product categories, showcasing a wide reach across customer preferences.

  • Brien Boise, Van Tuxwell, and Beverie Moffet follow closely with 53, 51, and 50 product categories respectively, indicating consistently high performance.

  • Barr Faughny, Jehu Rudeforth, Roddy Speechley, and Gunar Cockshoot — sold exactly 43 product categories, highlighting a tie in mid-tier performance.

  • The lowest-performing salesperson is Camilla Castle, with sales across only 32 product categories

  • Australia emerges as the top market with the highest number of chocolate box purchases (205 boxes), indicating strong regional demand.

  • India(184) is also in significant market but fall behind Australia in total box shipments.

  • The most purchased products are 50% Dark Bites and Eclairs, each with 60 units sold, suggesting these are customer favorites.

  • The least purchased product is Choco Coated Almonds, with only 39 units shipped.

2.2.2 Bi/Multi-Variate Analysis - Study of Data between two or more columns

In the Bi/Multi-Variate we have these variable combinations

In [ ]:
 
In [23]:
data.head(2)
Out[23]:
Sales Person Country Product Date Amount in thousand ($) Boxes Shipped
0 Jehu Rudeforth UK Mint Chip Choco 2022-01-04 5320 180
1 Van Tuxwell India 85% Dark Bars 2022-08-01 7896 94

Selecting specific columns combos from above based on business talks , to study data

Pure Numeric Mixed Columns Pure Categorical
Amount vs Boxes Shipped Sales Person vs Boxes Shipped Country vs Product
Month vs Boxes Shipped Country vs Box Shipped Sales Person vs Country
Amount vs Month Product vs Amount Sales Person vs Product
'' Product vs Box shipped ''

Pure Numeric

  • Descriptive Stats

    • Correlation coefficient
                                              Amount vs Boxes Shipped
In [24]:
print("Amount vs Boxes Shipped:")
display(round(data[['Boxes Shipped','Amount in thousand ($)']].corr(),2))
Amount vs Boxes Shipped:
Boxes Shipped Amount in thousand ($)
Boxes Shipped 1.00 -0.02
Amount in thousand ($) -0.02 1.00
In [25]:
px.scatter(data, x='Boxes Shipped', y='Amount in thousand ($)').show()
  • The correlation between "Amount in thousand ($)" and "Boxes Shipped" is -0.02, which is very close to 0.

  • This means there is almost no linear relationship between the total sales amount and the number of boxes shipped.

                                                 Month vs Boxes Shipped
In [29]:
df["Date"] = pd.to_datetime(df["Date"], format="%d-%b-%y")

df["Month"] = df["Date"].dt.strftime('%b')  
df["Month_num"] = df["Date"].dt.month     
In [30]:
monthly_boxes = df.groupby(["Month_num", "Month"])["Boxes Shipped"].count().reset_index()

monthly_boxes = monthly_boxes.sort_values("Month_num")
In [31]:
fig = px.bar(monthly_boxes, x="Month", y="Boxes Shipped", title="Monthly Chocolate Box Shipments", text_auto=True)

fig.update_layout(xaxis_title="Month", yaxis_title="Total Boxes Shipped")
fig.show()

Insights

  • June recorded the highest number of boxes shipped, totaling 163. And February had the lowest number of boxes shipped, with only 110.
                                               Amount vs Month
In [32]:
monthly_amount = df.groupby(["Month_num", "Month"])["Amount in thousand ($)"].sum().reset_index()
monthly_amount = monthly_amount.sort_values("Month_num")
In [33]:
fig = px.line(monthly_amount, x="Month", y="Amount in thousand ($)", title="Monthly Chocolate Sales",markers=True,text="Amount in thousand ($)")
fig.update_layout(xaxis_title="Month", yaxis_title="Amount (in Thousand $)")
fig.update_traces(marker=dict(size=9, color='blue', symbol='circle'),texttemplate='%{text:.0s}',textposition="top center")
fig.show()

Insights

  • January had the highest sales, with chocolates purchased worth approximately $896.1K (almost $900K). And April saw the lowest sales, with an amount of $674.05K.

Mixed Columns

                                     Sales Person vs Boxes Shipped
In [34]:
mydata=df.groupby("Sales Person")["Boxes Shipped"].sum().reset_index().sort_values("Boxes Shipped",ascending=False)
fig = px.bar(mydata,x="Sales Person", y="Boxes Shipped", title="Sales Person vs Boxes Shipped",text_auto='.2s')
fig.update_traces(textfont_size=9)  
fig.show()

Insights

  • Kelci Walkden shipped the highest number of boxes, totaling 54 and Camilia Castle shipped the fewest boxes, with a total of 32.
                                          Country vs Boxes Shipped
In [35]:
mydata=df.groupby("Country")["Boxes Shipped"].count().reset_index()
fig = px.bar(mydata,x="Country", y="Boxes Shipped", title="Country vs Boxes Shipped",text_auto='.2s')
fig.show()

Insights

  • Australia is the top-performing country in terms of boxes shipped.
                                                   Product vs Amount
In [36]:
mydata=df.groupby("Product")["Amount in thousand ($)"].sum().reset_index().sort_values("Amount in thousand ($)",ascending=False)
fig = px.bar(mydata,x="Product", y="Amount in thousand ($)", title="Product vs Amount",text_auto='.2s')
fig.update_layout(xaxis_tickangle=-45)
fig.update_traces(textfont_size=9)
fig.show()
                                            Product vs Boxes Shipped

Insights

  • Smooth Silky Salty is the most purchased product, with a total sales amount of $349.69K (almost $350K).

  • 70% Dark Bites recorded the lowest total amount, with $211.61K in sales.

In [37]:
mydata=df.groupby("Product")["Boxes Shipped"].sum().reset_index().sort_values("Boxes Shipped", ascending=False)
fig = px.bar(mydata,x="Product", y="Boxes Shipped", title="Product vs Boxes Shipped",text_auto='.2s')
fig.update_layout(xaxis_tickangle=-45)
fig.update_traces(textfont_size=10)
fig.show()

Insights

  • 50% Dark Bites had the highest number of boxes shipped, totaling 9.792K.

  • Choco Coated Almonds had the lowest number of boxes shipped, with 6.464K.

                                                Country vs Amount
In [38]:
df['Amount in thousand ($)'] = df['Amount in thousand ($)'].astype(float)
country_sales = df.groupby('Country')['Amount in thousand ($)'].sum().reset_index()

fig = px.pie(country_sales,names='Country',values='Amount in thousand ($)',title='Total Chocolate Sales by Country',hole=0.3) 
fig.show()

Insights

  • Australia has the highest total chocolate sales among all countries, contributing the largest share in revenue.

  • India, USA and the UK follow next, but with comparatively lower sales amounts.

Categorical

                                                  Country vs Product
In [39]:
fig = px.histogram(df, x="Country", color="Product", barmode="group",title="Country vs Product Distribution")
fig.show()

Insights

  • Australia is the top country by selling more products (16 products of 50% Dark Bites is distributed)

Multi-Variate Analysis

Pure Numeric

  • Desc Stats
In [40]:
data.head()
Out[40]:
Sales Person Country Product Date Amount in thousand ($) Boxes Shipped
0 Jehu Rudeforth UK Mint Chip Choco 2022-01-04 5320 180
1 Van Tuxwell India 85% Dark Bars 2022-08-01 7896 94
2 Gigi Bohling India Peanut Butter Cubes 2022-07-07 4501 91
3 Jan Morforth Australia Peanut Butter Cubes 2022-04-27 12726 342
4 Jehu Rudeforth UK Peanut Butter Cubes 2022-02-24 13685 184
In [41]:
data.corr(numeric_only=True)
Out[41]:
Amount in thousand ($) Boxes Shipped
Amount in thousand ($) 1.00 -0.02
Boxes Shipped -0.02 1.00
In [42]:
lower = np.triu(data.corr(numeric_only=True))
sns.heatmap(data.corr(numeric_only=True), annot=True, cmap='viridis', mask=lower)
Out[42]:
<Axes: >

Insights

  • The correlation between "Amount in thousand ($)" and "Boxes Shipped" is -0.02, which is very close to 0.

  • This means there is almost no linear relationship between the total sales amount and the number of boxes shipped.

  • In other words, shipping more boxes does not necessarily lead to higher revenue, possibly because:

    • Some products are low cost but shipped in large quantities.

      • Others may be high-priced but shipped in smaller numbers.
In [43]:
sns.pairplot(data)
Out[43]:
<seaborn.axisgrid.PairGrid at 0x22110eaea50>

Pure Categorical

  • Desc Stats
In [45]:
pd.crosstab(data['Country'], data['Product'])
Out[45]:
Product 50% Dark Bites 70% Dark Bites 85% Dark Bars 99% Dark & Pure After Nines Almond Choco Baker's Choco Chips Caramel Stuffed Bars Choco Coated Almonds Drinking Coco ... Manuka Honey Choco Milk Bars Mint Chip Choco Orange Choco Organic Choco Syrup Peanut Butter Cubes Raspberry Choco Smooth Sliky Salty Spicy Special Slims White Choc
Country
Australia 16 9 8 11 7 11 9 7 6 12 ... 8 11 7 8 12 6 11 8 8 11
Canada 6 8 6 6 7 8 5 6 10 9 ... 8 7 4 8 10 10 6 12 11 9
India 8 7 9 8 10 9 5 9 7 8 ... 4 6 10 7 8 9 7 11 14 7
New Zealand 7 6 12 5 12 5 8 6 3 6 ... 8 6 14 7 10 8 7 9 8 9
UK 12 6 6 11 5 8 9 9 9 12 ... 8 9 6 6 6 9 6 11 5 12
USA 11 6 9 8 9 7 5 6 4 9 ... 9 10 4 11 6 7 11 8 8 10

6 rows × 22 columns

In [46]:
px.sunburst(df, path=["Country", "Product"], values="Boxes Shipped")
In [ ]:
 

Overall Insights On DataSet:¶

Single Column Study:

  • Kelci Walkden is the top-performing salesperson, selling 54 unique product categories, showcasing a wide reach across customer preferences.

  • Brien Boise, Van Tuxwell, and Beverie Moffet follow closely with 53, 51, and 50 product categories respectively, indicating consistently high performance.

  • Barr Faughny, Jehu Rudeforth, Roddy Speechley, and Gunar Cockshoot — sold exactly 43 product categories, highlighting a tie in mid-tier performance.

  • The lowest-performing salesperson is Camilla Castle, with sales across only 32 product categories

  • Australia emerges as the top market with the highest number of chocolate box purchases (205 boxes), indicating strong regional demand.

  • India(184) is also in significant market but fall behind Australia in total box shipments.

  • The most purchased products are 50% Dark Bites and Eclairs, each with 60 units sold, suggesting these are customer favorites.

  • The least purchased product is Choco Coated Almonds, with only 39 units shipped.

Study of Data between two or more columns

  • Smooth Silky Salty is the most purchased product, with a total sales amount of $349.69K (almost $350K).
  • 70% Dark Bites recorded the lowest total amount, with $211.61K in sales.
  • June recorded the highest number of boxes shipped, totaling 163. And February had the lowest number of boxes shipped, with only 110.

  • January had the highest sales, with chocolates purchased worth approximately $896.1K (almost $900K). And April saw the lowest sales, with an amount of $674.05K.

  • 50% Dark Bites had the highest number of boxes shipped, totaling 9.792K.

  • Choco Coated Almonds had the lowest number of boxes shipped, with 6.464K.

  • Australia has the highest total chocolate sales among all countries, contributing the largest share in revenue.

  • India, USA and the UK follow next, but with comparatively lower sales amounts.

In [ ]: